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Abstract 

An interface between semi-empirical methods and the polarized continuum model 
(PCM) of solvation successfully implemented into GAMESS following the approach 
by Chudinov et al (Chem. Phys. 1992, 160, 41). The interface includes energy 
gradients and is parallelized. For large molecules such as ubiquitin a reasonable 
speedup (up to a factor of six) is observed for up to 16 cores. The SCF convergence 
is greatly improved by PCM for proteins compared to the gas phase. 

Introduction 

Continuum solvation models such as the polarized continuum model (PCM) [1] and the 
conductor-like screening model (COSMO) [2] offers a computational efficient model of 
solvation for molecules treated with electronic structure methods. This paper describes 
the implementation of an interface between the conductor-PCM (C-PCM) model [2J I3J 
H] and the NDDO-based semi-empirical methods implemented in GAMESS j3] (MNDO 
[6], AMI [7j, and PM3 [8]). There has been several different implementations of semi- 
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empirical/PCM interfaces [21 |9l [TOj El Q2] and this work follows the implementation 
proposed by Chudinov et al. |9J However, we also implement the corresponding energy- 
gradient terms and both the energy and gradient terms are parallelized and tested on 
relatively large systems such as the protein ubiquitin. 

This paper is organized as follows. 1) We review the relevant expressions for the semi- 
empirical/PCM interface. 2) We present results of solvation free energies and compare 
them to previous results. 3) We test the numerical stability for geometry optimiza- 
tions and vibrational analyses. 4) We present timings and parallelization speed-ups for 
protein-sized systems. 5) We summarize our findings and provide possible ideas for future 
improvements. 

Background and Theory 

In PCM, a molecule (the solute) is placed inside a solvent-cavity usually described by 
introducing interlocked spheres placed on the atoms of the molecule. The solvent is 
described as a polarizable continuum with dielectric constant e. The interaction between 
the solute and the solvent is described by the apparent surface charges (ASCs). The 
PCM equations are solved numerically by dividing the surface area up into a finite set of 
elements called tesserae with a corresponding ASC qi, an area ai and a position r*j. There 
are several implementations of the PCM [T3] and in this study we focus on the conductor- 
like PCM (C-PCM) [21 El S]. For high dielectric solvents such as water C-PCM yields 
nearly identical results to the more generally applicable integral-equation-formalism PCM 
(IEF-PCM) [H] but requires less computational resources. 
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For C-PCM the ASCs q are determined by solving the following matrix-equation 



Cq=-^V. 



(1) 



where the matrix C has the elements 



\rj - n\ 



Cu = 1.07 




(2) 



and V is the potential of the solute in the solvent for each tessera %. The potential V(i) 
on tessera i is given as 



r A - n 



(3) 



where A runs over all nuclei in the solute at position va carrying a charge Za- P is the 
density matrix of the solute and V^ u (i) are the interaction integrals over basis functions 
on a tessera i given as 



i 



rA-n 



U ) = (s's'lfAls) 



(4) 



For NDDO methods the right hand side of equation [4] is the interaction between a point 
charge on the surface (represented as sV in the NDDO approach) and the basis functions 
of the solute molecule on atom A. The (sV|/iz/) integrals needed in equation |i] are listed 
in Table [T] for s and p functions. The integrals are rotated from a local ideal coordinate 
system onto the molecular coordinate system. The local coordinate system is defined by 
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the distance between the atom A containing the basis functions fiu and the tessera i 



A . T A L a a a 

-R = ITT = -p,{Rxi Ry, Rz) = (Rx, Ry, Rz 

\r A -ri\ R 



u = -(-R y ,R x ,0), 



w = R x u. 



(5) 

(6) 
(7) 



and the four unique integrals from Table [I] are [T5] 



[s's'\ss] 



(s's'\sp a ) 



[s's'\p a p a ) 
[s's'IpkPk) 



r A - n 



1 



\fA-Ti\-Dx \f A -f i \+D 1 

1 1 



\r A -n\ + 2D 2 \f A -fi\-2D 2 
1 1 



y/\f A -ri\ 2 + 4Dl 



r A - n\ 



(8) 
(9) 
(10) 

(11) 



Here, D\ and D 2 are empirical parameters describing charge-separation for the multipoles. 
They are defined elsewhere. [15] Following Chudinov et al. [9] the density parameters ai 
are set to zero in this work and are therefore not shown in the equations. 

The electrostatic interaction of the ASCs q on the surface and the molecule is treated 
by introducing the following one-electron contribution to the Fock matrix 



F' = F +V 



(12) 



where 



i 



(13) 
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Finally, the PCM electrostatic interaction free energy is calculated as 



G = V • q. (14) 



Optimization of the molecular geometry in the PCM field requires the derivative of G 
with respect to an atomic coordinate A x 

dG dV T e 1 T dC 

dAr^A^—e-^dA^ (15) 

the last term is computed analytically |16j . The derivative of the potential with respect 
to an atomic coordinate is done analytically and we give explicit expressions for all terms 
in Text SI. 



Methods 

Computational Details 

The semi-empirical/PCM interface was implemented in a locally modified version of 
GAMESS [5]. The semi-empirical energy and gradient evaluations were allowed to run 
in parallel but no efforts were made to parallelize the integral evaluation or the assembly 
of the Fock matrix since the diagonalization is the major computational bottle-neck for 
large systems. The evaluation of the electrostatic potential (equation [3]) and its deriva- 



tive (equation 15) was parallelized. We note that the remaining semi-empirical integral- 
derivatives in GAMESS is evaluated numerically. 

We compared our implementation to that of Chudinov et al. for twenty smaller ammo- 
nium and oxonium type molecules used in that study. The structures were generated from 
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their SMILES string (see Table [2] and Table [3]) using Open Babel [T7J [18] and optimized 
in the gas phase and afterwards using the newly implemented code. 

Geometry optimizations used a convergence threshold of 5.0 ■ 10~ 4 Hartree Bohr -1 
(OPTTOL=5.0E-4 in SSTATPT). To verify the minima, hessians were calculated for 
all optimized geometries by double difference (NVIB=2 in $FORCE). When using PCM 
for geometry optimizations the FIXPVA [19] tessellation scheme was used (MTHALL=4 
in STESCAV) and the tesserae count for each sphere was set to 60 (NTSALL=60 in 
STESCAV). For solvation free energies the tesserae count was raised to 960 (NTSALL=960 
in STESCAV) and the GEPOL-GB (Gauss-Bonet) [?] tessellation scheme (MTHALL=1 
in STESCAV) was used. 

The Mean Absolute Deviations (MADs) of vibrational frequencies between solvated 
(s) and gas-phase (g) calculations were calculated by 



We also carried out single point energies and gradients calculations for Chignolin 
(PDB: 1UAO), Trypthophan-cage (PDB: 1L2Y), Crambine (PDB: 1CRN), Trypsin In- 
hibitor (PDB: 5PTI) and Ubiquitin (PDB: 1UBI). The proteins were all protonated with 
PDB2PQR[20l ETJ and PROPKA[22] at pH = 7 yielding overall charges of -2, 1, 0, 6 and 
respectively. Either no convergence acceleration, Direct Inversion of the Iterative Sub- 
space [?] (DIIS=.T. in SSCF) or Second-Order Self Consistent Field [?, ?] (SOSCF=.T. 
in SSCF) was used. In all cases the C-PCM equation was solved iteratively. [23J The 
timings were performed on up to 24 cores on AMD Optirun 6172 shared-memory CPUs. 
The method is included in the latest release of the GAMESS program. 




n 



(16) 



i=l 
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Results 

Electrostatic Solvation Free Energies 

The electrostatic solvation free energies are presented in Tables [2] and [3] for ammonium 
and oxonium species calculated using PM3/PCM and compared to results published by 
Chudinov et al. [9] In general, our results underestimate the electrostatic solvation free 
energy by an average of -1.3 kcal mol -1 and -1.9 kcal mol -1 . The main source of the 
difference is likely the fact that Chudinov et al. uses the original PCM implementation 
of Miertus, Scrocco and Tomasi [?] often referred to a D-PCM) while we use the C-PCM 
implementation. The solvation free energies from these implementations can differ by 
several kcal / mol even for neutral molecules [?] . (While the reference describes a compar- 
ison of D-PCM to IEF-PCM, IEF-PCM and C-PCM yield nearly identical solvation free 
energies for water.) Another likely source of error is that we use the GEPOL-GB scheme 
where Chudinov et al. uses a more elaborate scheme to reach convergence of the solvation 
free energies by subdividing the surfaces incrementally. 

Vibrational Frequencies 

To test the numerical accuracy of the PCM gradients we optimized the molecules listed in 
Tables [2] and [3] As indicated in Table [4] three of the geometry optimizations (Al, 01, and 
02) do not converge. Al can be made to converge by skipping the update of the empirical 
Hessian matrix (UPHESS=SKIP) but this does not appear to be a general solution to 
the problem. While some gradient components in these minimizations are quite large the 
optimizing algorithm eventually settles on a zero step size causing the optimization to 
effectively stall. The cause of this behavior is not clear since it is only observed for the 
smallest systems and was not investigated further. The resulting geometries still lead to 
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a positive definite Hessian and the frequencies are not unusually different from the gas 
phase values. 

In four cases (A7, 04, 06, 08 and 09) the vibrational analyses yields imaginary fre- 
quencies between 26 and 200 cm -1 . In the case of 08 and 09 this also occurs for the 
RHF/STO-3G calculations and in the case of 07-09 this also occurs for PM3 structures 
optimized in the gas phase. In most cases the imaginary frequency is associated with 
the 0+ ion and a neighbouring methyl group. The most likely source of these imagi- 
nary frequencies is a flat PES associated with the 0+ group combined with numerical 
inaccuracies in the PCM and PM3 gradients. 

Timings 

In Table [5] we show absolute timings for single point energy and gradient evaluations 
of proteins either in the gas phase, using DIIS to obtain convergence, or by including 
the PCM field either with or without SCF convergence acceleration. None of the listed 
proteins converged in the gas phase without DIIS and even then the SCF converged only 
for the three smallest proteins: Chignolin, Tryptophan-Cage and Crambine. 

The cost of optimizing the wavefunction in PCM is between two (Crambine) and three 
(Chignolin and Tryptophan-cage) times more expensive than without. For Chignolin, 
which is the smallest protein in our test set, it took 21 SCF iterations to converge in 
PCM while only 13 for PCM/DIIS and 14 for PCM/SOSCF. The other proteins converged 
within 17 iterations without convergence acceleration and within 14 iterations with. For 
absolute timings regarding larger proteins, Crambine, Trypsin Inhibitor and Ubiquitin 
finished in 1293, 3455 and 6732 seconds with PCM without convergence acceleration, 
but are slower (1314, 3649 and 8777 seconds, respectively) with PCM and DIIS enabled. 
Using SOSCF did not result in an appreciable decrease in CPU time. The increase in 
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CPU time when using DIIS is due to the extra matrix operations associated with this 
method, which represent the computational bottleneck for sem-empirical methods. 



Evaluating the ASC potential derivative (equation 15) analytically has a negligible 
computational cost compared to evaluating the wavefunction as can be seen from the last 
column of Table [5] 

The relative speedup from running in parallel in the gas phase is shown on Figure SI 
where no improvement is observed beyond 4 cores (with a speed up factor of 3) and is 
not discussed further. The PM3/PCM timings (Figure [T]) show better improvement when 
utilizing multiple cores for all systems. The smaller systems obtain some improvement 
(a factor 3.4 and 4.2 for Chignolin and Tryptophan-cage, respectively) whereas the larger 
systems sees improvements of 5.7, 5.7 and 5.9 for Crambine, Trypsin Inhibitor and Ubiq- 
uitin, respectively. In all cases maximum speed up is reached for 16 cores because the use 
of 24 cores introduces some communication overhead which degraded performance. 



Conclusion and Outlook 

An interface between semi-empirical methods and the polarized continuum model (PCM) 
of solvation successfully implemented into GAMES S following the approach by Chudinov 
et al. [9] The interface includes energy gradients and is parallelized. 

For very small systems we found some numerical instability problems in the gradient 
which caused geometry convergence failure, but geometry optimization appears robust for 
larger molecules. The use of PCM occasionally introduces imaginary frequencies in the 
Hessian analysis, but this was also found for RHF/STO-3G PCM calculations and even in 
a few semi-empirical gas phase calculations so these problems do not appear to be specific 
to the to the current implementation. We therefore consider the current implementation a 
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working code for all practical purposes, but welcome feedback from readers who encounter 
numerical stability problems for large molecules 

For semiemprical methods the most time CPU-intensive part of the calculation remains 
the solution of the SCF equations. This part of the code was already parallelized in 
GAMESS and we show, for the first time, that this implementation applies to semi- 
empirical methods and the new PCM interface. For large molecules such as Ubiquitin a 
reasonable speedup (up to a factor of six) is observed for up to 16 cores. 

It will be interesting to see how much the numerical stability and computational 
efficiency will improve once the interface is combined with the recently developed FIX- 
SOL/FIXPVA2 method developed by Li and coworkers [21]. We are currently working on 
implementing the PM6 method in GAMESS to further increase the accuracy and range 
of application that this new interface offers. 
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Figure 1. Speedup by using multiple cores with PCM enabled for single gradient 
evaluation. 
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Table 2. Predicted electrostatic solvation free energies of ammonium type 
molecules. 



Ref PM3/PCM RHF/STO-3G/PCM 



[NH4+] 


Al 


83.9 


82.4 


(-1.5) 


78.6 


(-3.8) 


C[NH3+] 


A2 


73.7 


72.6 


(-1.1) 


71.3 


(-1.3) 


CC[NH3+] 


A3 


70.2 


69.2 


(-1.0) 


68.6 


(-0.6) 


CCC[NH3+] 


A4 


69.9 


68.5 


(-0.8) 


67.6 


(-1.0) 


CC([NH3+])C 


A5 


67.1 


65.9 


(-1.2) 


66.2 


(0.3) 


CCCC[NH3+] 


A6 


69.3 


68.3 


(-1.0) 


67.1 


(-1.2) 


CC([NH3+])(C)C 


A7 


64.1 


62.8 


(-1.3) 


67.1 


(1.2) 


C[NH2+]C 


A8 


65.9 


64.4 


(-1.5) 


65.3 


(0.9) 


CC[NH2+]CC 


A9 


59.5 


58.0 


(-1.5) 


60.7 


(2.7) 


C[NH+](C)C 


A10 


59.7 


57.7 


(-2.1) 


61.8 


(4.2) 



AVG -1.3 



Obtained results using PM3/PCM compared with results by Chudinov et al. (labelled 
"Ref) and RHF/STO-3G/PCM results. PM3/PCM numbers in parenthesis are 
deviations to the reference. RHF/STO-3G deviations are taken to PM3/PCM results. 
All numbers are in kcal mol -1 . 

Table 3. Predicted electrostatic solvation free energies of oxonium type 
molecules 



Ref PM3/PCM RHF/STO-3G/PCM 



C[OH2+] 


01 


74.1 


72.6 


(-1.5) 


73.7 


(1.1) 


CC[OH2+] 


02 


69.2 


67.1 


(-2.1) 


70.2 


(3.0) 


C[OH+]C 


03 


65.1 


63.4 


(-1.7) 


65.5 


(2.1) 


C[OH+]CC 


04 


61.1 


59.0 


(-2.1) 


62.5 


(3.5) 


ClC[OH+]CCl 


05 


59.6 


57.3 


(-2.3) 


61.0 


(3.8) 


CC[OH+]CC 


06 


57.4 


55.4 


(-2.0) 


59.8 


(4.1) 


C[OH+]clcccccl 


07 


54.5 


53.3 


(-1.2) 


57.4 


(4.4) 


CC(=[OH+])C 


08 


62.5 


60.0 


(-2.5) 


64.3 


(4.3) 


CC(C)C(=[OH+])C(C)C 


09 


53.2 


51.0 


(-2.2) 


56.0 


(5.0) 


COC(=[OH+])C 


O10 


60.0 


58.7 


(-1.3) 


62.6 


(3.9) 



AVG -2.0 



Obtained results using PM3/PCM compared with results by Chudinov et al. (labelled 
"Ref) and RHF/STO-3G/PCM results. PM3/PCM numbers in parenthesis are 
deviations to the reference. RHF/STO-3G deviations are taken to PM3/PCM results. 
All numbers are in kcal mol -1 . 
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Table 4. Optimization steps and frequencies for solvated molecules. 







Asteps 


MAD [cm" 1 ] 




PM3 


RHF/ST0-3G 


PM3 


RHF/ST0-3G 


Al 


- 


14 


135.1 


131.9 


A2 


9 


10 


121.5 


90.8 


A3 


6 


8 


64.6 


39.2 


A4 


6 


18 


25.7 


37.9 


A5 


4 


17 


16.9 


24.6 


A6 


10 


9 


30.4 


15.5 


A7 


32 


32 


24.8 a 


22.3 


A8 


25 


24 


56.2 


32.8 


A9 


34 


19 


27.3 


31.3 


A10 


32 


18 


58.3 


62.2 


01 




6 


151.8 


60.1 


02 




8 


111.5 


36.2 


03 


15 


8 


96.8 


57.0 


04 


6 


8 


67.1 a 


28.3 


05 


11 


9 


85.6 


29.8 


06 


15 


11 


56.0 a 


54.5 


07 


6 


6 


50.1 


24.5 


08 


11 


7 


87.7 a 


22.0 a 


09 


6 


8 


28.8 a 


12.6 a 


010 


3 


6 


20.8 


19.9 



Number of optimization steps for PM3/PCM and RHF/STO-3G/PCM optimizations 
along with Mean Absolute Deviations (MADs) of vibrational frequencies when going 
from gas phase to a solvated molecule for all 20 small molecules tested in this work. All 
optimizations were done in Cartesian coordinates. Translational and rotational 
frequencies are not included. Dashes marks unconverged structures after 100 
optimization steps. a marks optimized structures with at least one imaginary frequency. 
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